perm filename READA[B2,JMC]3 blob
sn#769733 filedate 1984-09-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00012 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \chapter{Introduction to Lisp}
C00003 00003 \section{The basic ideas of pure \lisp.}
C00019 00004 \section{Atoms.}
C00022 00005 \section{Lists.}
C00029 00006 \section{Lists as S-expressions}
C00036 00007 \section{List structures.}
C00047 00008 % exercises
C00050 00009 \section{Equality and other \lisp\ predicates.}
C00056 00010 \section{Defining functions and predicates on lists.}
C00061 00011 \section{\lisp\ programs as \lisp\ data}
C00068 00012 \section{\lisp\ terms}
C00074 ENDMK
C⊗;
\chapter{Introduction to Lisp}
\chaplab{readin}
\lisp\ is a programming language for computing with the
symbolic expressions that arise in artificial intelligence and
for computing with mathematical formulas. We begin with pure \lisp,
the core of the language.
\section{The basic ideas of pure \lisp.}
\sectlab{pure}
Pure
\lisp\ is best introduced rather abstractly. The data
of \lisp\ are called S-expressions. Call the set of them \qS!.
Begin with the set of atoms; call it \qA!. For our
abstract approach it doesn't matter what the atoms are. We only
need to be able to tell whether an S-expression is an atom and to
tell whether two atoms are the same. In this section we'll
use capital letters, e.g. |A|, |B|, etc., for atoms.
Because parentheses are used in writing S-expressions,
we use square brackets for applying a function to arguments.
Thus $\qat[x] = |T|$ if $x$ is an
atom. However, we usually reduce the number
of brackets by omitting them for functions of one argument which
gives simply $\qat x = |T|$.
S-expressions are built up beginning with atoms according to
the following rule.
{\it An atom is an S-expression, and an ordered pair of
S-expressions is an S-expression. All S-expressions are formed
in this way.}
Another way of expressing this fact is with the formula
$$\qS! = \qA! + \qS!\times\qS!.$$
We denote ordered pairs by surrounding the items with
parentheses and putting a period surrounded by spaces between
them. Thus
$$\displaylines{
|A|\cr
|(A . B)|\hbox{ and}\cr
|((A . A) . (C . D))|\cr}$$
denote S-expressions, i.e. members of \qS!.
From this abstract point of view we need say nothing about
how S-expressions are represented in the memory of a computer
or on paper. The capital letters for atoms and the parentheses
and dots are just examples of a way of naming S-expressions.
We do need names for the basic operations on S-expressions.
The operation that forms pairs is read {\it cons}.
It is often convenient to write this operation as an infix dot, so
that $cons[x,y]$ is equivalent to $x\qcons y$.
The set of pairs is called \qC! from $cons$, i.e. \qC! is
the set of $cons$es. \qC! and \qA! do not
intersect, i.e.
$$\qA! ∩ \qC! = \emptyset,$$
%
where $\emptyset$ is the empty set, i.e. the set with no members.
Inverse to the operation $\qcons$ that forms pairs are the two
operations \qa\ and \qd\ that take the components of a pair. These
operations are read $car$ and $cdr$ respectively. The fact
that these operations are inverse to $cons$ is expressed by the equations
%
$$\qa\ [x\qcons y] = x,$$
%
$$\qd\ [x\qcons y] = y,$$
%
and
$$x\in \qC! ⊃ [x = [[\qa x] \qcons [\qd x]]].$$
Here we have used
the logical symbol $⊃$ meaning {\it implies} and the set-theory
symbol $\in$ meaning {\it is a member of}. We will treat
logical and set-theoretic symbolism more fully later.
We make the the convention that single argument functions
bind more tightly than infixes and relations (like equality)
bind more tightly that logical symbols like implication, so the
last equation can be written without brackets as
%
$$x\in \qC! ⊃ x = \qa x \qcons \qd x.$$
%
It is necessary to distinguish the center dot
operation $\qcons$ that forms S-expressions from
the dot used in writing S-expressions. The former is an operation
that can be executed, i.e. a program that takes $x$ and $y$,
computes $x\qcons y$, whatever $x$ and $y$ may be, while the
dot in $|(A . B)|$ is just a part of the way we write
that one S-expression. Of course, we have
%
$$|A|\qcons |B| = |(A . B)|,$$
%
but we can't write $(x . y)$ with $x$ and $y$ as variables.
With what we have so far we can only write trivial facts,
but to check your understanding, notice that the following are
consequences of what we have so far.
%
$$|(B . C)| = \qd|(A . B)|\qcons \qa|(C . D)|.$$
%
$$\qat\qad[|(A . (B . C))|].$$
%
To write pure \lisp\ programs, we need two more things ---
conditional expressions and recursion.
The expression
%
$$\qif p \qthen a \qelse b$$
%
is called a {\it conditional expression} and is evaluated as follows:
First evaluate $p$. If $p$ is true, then the value
of the conditional expression is that of $a$. Otherwise its
value is that of $b$. We will discuss exactly what ``true'' means
in \lisp\ in section \sectref{equality}.
Now we can define a simple pure \lisp\ program for reversing
the \qa-parts and \qd-parts of an S-expression and its subexpressions.
We'll call our function /flip/, and we want, for example, that
%
$$/flip/[|((A . B) . C)|]= |(C . (B . A))|.$$
%
The \lisp\ program is defined by
%
$$/flip/ x ← \qif \qat x \qthen x
\qelse /flip/ \qd x\qcons /flip/\qa x.$$
%
If you're still shaky on the conventions for omitting brackets, note that
this is equivalent to
%
$$/flip/[x] ← [\qif \qat[x] \qthen x \qelse /flip/[\qd[x]]\qcons /flip/[\qa[x]]],$$
%
but from now on the bracket omission conventions will be used without further
explanation.
The $←$ means that the left hand side is evaluated by evaluating
the right hand side. The recursion means that function being defined,
in this case /flip/, may occur on the right hand side (but with different
arguments if the evaluation is to succeed).
For example, we have
%
$$\ialign{\hfil$#$&$ ← #$\hfil\cr
flip |A| & \qif \qat |A| \qthen |A| \qelse flip \qd |A|\qcons flip \qa |A|\cr
&|A|,\cr
}$$
%
since $\qat |A|$ is true, not reaching (as the lawyers put it) the issue
of what $\qd |A|$ might be. Clearly $flip$ applied to any atom will
be just that atom. In a more complicated case we have
%
$$\eqalign{%
/flip/[|(A . (B . C))|]&← \qif \qat |(A . (B . C))| \qthen |(A . (B . C))|\cr
& \xx3 \qelse /flip/\qd |(A . (B . C))|\qcons /flip/\qa |(A . (B . C))|\cr
&←/flip/\qd |(A . (B . C))|\qcons /flip/\qa |(A . (B . C))|\cr
&←/flip/|(B . C)|\qcons /flip/ |A|\cr
&←[\qif \qat |(B . C)| \qthen |(B . C)| \cr
&\ \qelse /flip/\qd |(B . C)|\qcons /flip/\qa |(B . C)|]\qcons |A|\cr
&←[/flip/|C|\qcons /flip/|B|]\qcons |A|\cr
&←[|C|\qcons |B|]\qcons |A|\cr
&←|(C . B)|\qcons |A|\cr
&←|((C . B) . A)|.\cr
}$$
The function definition
%
$$\eqalign{
subst[x,y,z] ← &\qif \qat z \qthen [\qif z\qequal y
\qthen x \qelse z]\cr
&\qelse /subst/[x,y,\qa z]\qcons /subst/[x,y,\qd z]\cr}$$
%
gives the result of substituting the S-expression $x$ for the
atom $y$ in the S-expression $z$. For example,
%
$$subst[|(A . B)|,|C|,|((C . D) . C)|] = |(((A . B) . D) . (A . B))|.$$
As we shall see in the rest of the first two chapters,
pure \lisp\ programs of arbitrary power can be constructed by
using already defined functions in the definition of new ones.
The simple structure of pure \lisp\ programs, makes it easy to
regard them as formulas of a language of mathematical logic --- a
first order language --- to use the precise mathematical terminology.
Proofs of the properties of these programs are then straightforward,
and such proofs can even be put into a form in which they can be
checked by a computer program. Even better, an interactive theorem
prover can simultaneously help construct the proof and check its
correctness. You are then only responsible for being sure that
what you proved is what you meant and that you didn't sneak in any
false statements as axioms.
The function $/flip/$ provides a convenient example of proof.
\noindent{\bf Theorem:} $/flip/ /flip/ x = x$.
\noindent{\bf Proof:} Consider the set $goodflip$
of S-expressions $x$ for which the
theorem is true. Clearly $goodflip$ contains all atoms, because if
$x$ is an atom, $/flip/x = x$, and hence $/flip/ /flip/ x = x$.
Now suppose $goodflip$ contains the S-expressions $x$ and
$y$. We have
%
$$\eqalign{%
/flip//flip/[x\qcons y]&←/flip/[\qif \qat[x\qcons y] \qthen x\qcons y
\qelse /flip/\qd[x\qcons y]\qcons/flip/\qa[x\qcons y]]\cr
&←/flip/[/flip/y\qcons/flip/x]\cr
&←\qif\qat[/flip/y\qcons/flip/x]\qthen/flip/y\qcons/flip/x \cr
& \xx3 \qelse/flip/\qd[/flip/y\qcons/flip/x]\qcons/flip/\qa[/flip/y\qcons/flip/x]\cr
&←/flip//flip x/\qcons/flip//flip/y\cr
&←x\qcons y,\cr
}$$
%
which shows that $goodflip$ also contains $x\qcons y$. Now look
back at the definition of the set \qS! of S-expressions. It is the
least set that contains all atoms and whenever it contains $x$
and $y$ also contains $x\qcons y$. Therefore, $goodflip$
contains \qS!, i.e. the relation
$/flip//flip/ x = x$ is true for all S-expressions $x$.
More interesting properties of computer programs admit similar,
though usually longer, proofs. However, such informal
proofs are not checkable by computer. For simple properties like
this, the computer checkable form to be discussed in
\chapref{provin} is much shorter. Also this proof glosses
over a number of difficulties that
require the more detailed treatment given in that chapter.
This completes our preliminary description of pure \lisp\ and
its proof methods. The description is incomplete in that it doesn't
take proper account of the fact that the computation described
by a recursive definition sometimes doesn't terminate, and whether
it terminates may depend on the order in which the computations are
done. These matters will be discussed later on and the formalism
elaborated --- at the cost of complication that would interfere with
understanding if done in this section.
We now turn to actual \lisp\ programming systems. This
book emphasizes \clisp, but almost everything applies to
other \lisp\ dialects.
S-expressions are convenient for giving the abstract
properties of \lisp, and, as we shall see, they are easily
and naturally
represented in the memory of a computer. However, for
most interesting kinds of symbolic information,
the special case of
lists is more convenient. We will show how information is represented
by lists and how lists are represented by S-expressions.
First we treat atoms.
\section{Atoms.}
\sectlab{atoms}
This section describes the simpler kinds of atoms
and their representation by sequences of characters. It would
be confusing at this point to introduce all the different kinds
recognized by \clisp\ and other \lisp\ systems. What we will say about atoms
in this section is correct, we think, but is somewhat delicately
worded to avoid both misstatement and premature complication.
You can always refer to the reference manual for \clisp\ or
other \lisp\ implementation.
An atom is written as a
sequence of capital letters, digits, and special characters with
certain exclusions. The exclusions are {\it space}, {\it carriage return},
and the other non-printing characters, and also the parentheses,
brackets, semi-colon, and comma. Numbers are expressed as signed
decimal or octal numbers, the exact convention depending on the
implementation. Floating point numbers are written with decimal
points and, when appropriate, an exponent notation depending on the
implementation. The reader should consult the programmer's manual
for the \lisp\ implementation he intends to use.
All \clisp\ implementations are supposed to adhere to the conventions
of the {\it Common Lisp Reference Manual} (Steele, et. al. 1984).
Some further examples of atoms are shown in figure \figref{fiia}.
\beginfig{h}{Examples of atoms.}
\figlab{fiia}
$$\vbox{\smallskip\halign{ # \hfil\cr
|A|\cr
|ONION|\cr
|ATOM|\cr
|CONS|\cr
|THE-LAST-TRUMP|\cr
|A307B| \cr
|345| \cr
|-47| \cr
|-45.21| \cr
|3.14159| \cr
}}$$
\endfig
In \clisp, but not in most others, |A.B| is an atom.
\section{Lists.}
\sectlab{lists}
The lists \qL! are a subdomain of the S-expressions \qS!.
We begin with examples.
Figure \figref{fi} shows some examples of lists.
The list (i) has four elements, one of which is repeated and
one of which is itself a list.
The list (ii) also has one element which itself is a list.
The list (iii) has an element, namely $|(A . B)|$, which is neither
an atom nor a list.
The list (iv) has no elements; it is also written $|NIL|$.
In \clisp\ (v) is a list of one element; in most other \lisp s
it is a non-list S-expression.
\beginfig{h}{Examples of lists.}
\figlab{fi}
$$\vbox{%
\ialign{\hfil#&\quad #\hfil\cr
(i) & |(A B (C D) B)|\cr
(ii) & |((A B C D))|\cr
(iii) & |((A . B) C D)|\cr
(iv) & |()|\cr
(v) & |(A.B)|\cr
}}$$
\endfig
Figure \figref{fii}
shows how symbolic information can be represented by lists.
Each example consists of a list and a ``mathematical'' representation of
the same information. Examples (i)-(iv) represent familiar mathematical
and logical expressions. The list (v)
is used to represent the network or graph shown according to a scheme
whereby there is a sublist for each vertex consisting of the vertex
itself followed by the vertices to which it is connected.
The elements of a list are surrounded by parentheses and
separated by spaces. A list may have any number of terms and any of
these terms may themselves be any S-expressions. In this case, the spaces
surrounding a sublist may be omitted, and extra spaces between
elements of a list are allowed. Thus
|(A B(C D) E)| and |(A B (C D) E)|
are slightly different ways of writing the same list.
\beginfig{t}{Information represented as lists.}
\figlab{fii}
\figsize
\medskip
\openup 1\jot
\ialign{\qquad\hfil#&\qquad#\hfil\cr
(i) & |(+ X Y)|\cr
& $x + y$\cr
\noalign{\smallskip}
%
(ii) & |(+ (* X Y) X 3)|\cr
& $xy + x + 3$ \cr
\noalign{\smallskip}
%
(iii) & |(EXIST X (ALL Y (IMPLIES (P X) (P Y))))|\cr
& $∃x.∀y.[P(x)\limp P(y)]$ \cr
\noalign{\smallskip}
%
(iv) & |(INTEGRAL 0 INFINITY (* (EXP (* I X Y)) (F X)) X)|\cr
& $\int↑{\infty}_0 e↑{ixy}f(x)dx$\cr
\noalign{\smallskip}
%
(v) & |((A B) (B A C D) (C B D E) (D B C E) (E C D F) (F E))|\cr
&\vtop
{\ialign{\hbox to 40pt{#\hfil}&&\hbox to 40pt{#\hfil}\cr\noalign{\offinterlineskip}
&&C\cr
&&\special{point 24}\cr
\strut\cr
&B&&E\cr
A \special{point 25}&\special{point 26}&&\special{point 27}&\special{point 28} F\cr
\strut\cr
\strut\cr
&&\special{point 29}\cr
&&D\cr
}}
\special{join 12800 25 26 24 27 28}
\special{join 12800 26 29 27}
\special{join 12800 24 29}
\cr
\noalign{\smallskip}
}
\endfig
Representing symbolic expressions by \lisp\ lists should
be compared with representing it as strings of characters. On
the one hand, strings of characters are the notation commonly
used on paper. However, to compute with them, they have to
be parsed. Thus to know that the expression $\sin(2x+y) + a$ is
a sum, a program must look for a $+$ sign that
is not within a parenthesized subexpression.
The customary \lisp\ representation of the same information
may be regarded as already parsed. The first element of the list
representing a sum is the $+$ sign, and the arguments
are obtained by single \lisp\ basic functions. If conventional
notation is wanted, it is appropriate to translate the conventional
expressions into \lisp\ lists, compute with them in that form, and
translate the output into the desired conventional form.
This is analogous to the fact that it has proved best to
translate numbers from decimal to binary, compute in binary and
then translate the output into decimal. Indeed the conventional
mathematical symbolic expressions are more analogous to Roman
numerals in the variety and irregularity of the notations used.
\section{Lists as S-expressions}
\sectlab{listsexp}
The domain \qN! of natural numbers begins with 0,
and the domain \qL! of
lists begins with the null list that has no members. We can write it
$|()|$, but it is also written as the atom $|NIL|$, since that atom has
traditionally been used to represent the null list. A special predicate \qn\
tests whether a list is null, i.e. $\qn u$ is the assertion that the list
$u$ has no elements.
More generally, the list $(m↓1, \cdots ,m↓n)$ is represented
by the S-expression
$$(m↓1 . (m↓2 . \cdots . (m↓n . |NIL|)\cdots)).$$
For example, $|(A B C D)|$ is represented by
$$|(A . (B . (C . (D . NIL))))|.$$
If the list has multiple levels, each sublist is represented using the
same rule. Thus $|(A (B C) D)|$ is represented by
$$|(A . ((B . (C . NIL)) . (D . NIL)))|.$$
We can regard the list notation as an abbreviation for the
corresponding S-expression. The subdomain \qL{p} of
``proper lists'' whose elements
are either atoms or again proper lists is also important.
Thus $|(A (B C))|$ is proper, but $|(A (B . C))|$ isn't.
Since lists are S-expressions, the basic functions on
lists are determined by their action on S-expressions. We have
%
$$\eqalign{
\qa (m↓1 \cdots m↓n) &= \qa (m↓1 . (m↓2 . \cdots (m↓n . |NIL|)\cdots))\cr
&= m↓1}$$
%
and
%
$$\eqalign{
\qd (m↓1 \cdots m↓n) &= \qd (m↓1 . (m↓2 . \cdots (m↓n . |NIL|)\cdots))\cr
&= (m↓2 . (m↓3 . \cdots (m↓n . |NIL|)\cdots))\cr
&= (m↓2 \cdots m↓n)\cr}.$$
We see that $\qa u$ is the first element of the list $u$, and
$\qd u$ is the rest of the list after the first element has been deleted.
As examples we have
%
$$\qa |(A B C)| = |A|,$$
%
$$\qa |(A)| = |A|,$$
%
$$\qd |(A B C)| = |(B C)|,$$
%
and
%
$$\qd |(A)| = |()| = |NIL|.$$
The symmetry of $\qa$ and $\qd$ as
functions of S-expressions is broken by our choice of how we represent
lists as pairs. Had we chosen the opposite convention, $\qd$ would
give the last element of a list and $\qa$ would give the sublist consisting
of all elements but the last.
As for $cons$, we have
%
$$\eqalign{
m↓1\qcons |(|m↓2 \cdots m↓n|)| &= m↓1\qcons
|(|m↓2 . \cdots |(|m↓n . |NIL||)|\cdots|)|\cr
&= |(|m↓1 . |(|m↓2 . \cdots . |(|m↓n . |NIL||)|\cdots|))|\cr
&= |(|m↓1 \cdots m↓n|)|\cr
}.$$
Thus $cons$ puts its first argument on the front of the list
designated by its second argument. Some people write
$cons[x,u]$ instead of $x\qcons u$. Informally people speak of
$cons$ing an element onto the front of a list.
Examples of this are
%
$$\eqalign{
|A|\qcons |(B C D)| &= |(A B C D)|\cr
|(A)|\qcons |(B C D)| &=|((A) B C D)|\cr
|A|\qcons|NIL| &= |(A)|\cr
}.$$
Suppose we want to form the expression that represents the
sum of the expression $x$, the expression $y$ and the number 3, i.e.
we want the operations that form the \lisp\ equivalent of $x+y+3$
from its variable parts $x$ and $y$. By variable here we mean that
that the program that forms $x+y+3$ may be used with different $x$
and $y$ each time it is executed. This may be done by the \lisp\
expression $|+| \qcons [x\qcons[y\qcons |(3)|]]$. If we wanted
$x+3+y$, we would write $|+|\qcons[x\qcons[3\qcons[y\qcons |NIL|]]]$.
In the former case the {\it tail} of the list starting with 3 is
constant and doesn't have to be $cons$ed up each time the program
is executed.
Actually \lisp\ allows the abbreviation $<+ x y 3>$ for the
former expression. Thus $<m↓1, \cdots,m↓n>$ is the operation that
forms the list $(m↓1 \cdots m↓n)$ from its constituent expressions.
The terms that select components of a list or S-expression
also have abbreviations. For example $\qa\,\qd\,\qd u$, which gives
the third element of a list may be written $\qadd u$. Thus $\qa u$,
$\qad~u$, $\qadd~u$ and $\qaddd~u$ give the first four elements of the
list $u$. $\qd u$, $\qdd u$, etc. give successive sublists. Most \lisp\
implementations include the functions |CAAR|, |CADR|, |CDAR|,
|CADDR|, and so on, as
abbreviations for chains of up to four |CAR|s and |CDR|s.
\section{List structures.}
\sectlab{liststruc}
The details of this section are mostly unnecessary for doing the
programs of this and the next chapter,
because most of the time it is simpler think
about \lisp\ data in a more abstract way. However, some \lisp\
conventions might seem rather peculiar to a person who completely
ignores how data is represented in the computer.
Atoms and S-expressions (including lists)
are represented in the memory of the computer by
list structures. A list structure is a collection of memory units called
{\it cons-cells}.
A cons-cell generally consists of one or possibly two consecutive computer
words. It is divided into two parts, the \qa-part and the \qd-part,
with each part being capable of containing an address in memory.
If $x$ is the S-expression represented by the cons-cell, then
the \qa-part of the cell contains the address of $\qa x$, and
the \qd-part of the cell contains the address of $\qd x$.
The list structure representing a list contains a cons-cell for each
element of the list. The \qa-part of the cell contains the address
of the list structure representing the element and the \qd-part contains
the address of the cell for the next element of the list.
These addresses are sometimes called pointers as they ``point'' to
the component list structures.
A diagram shows this more clearly than words. Figure \figref{fiii} shows
the list structure corresponding to the list |(+ (* X Y) X 3)|
(which might represent the expression $xy + x + 3$).
\beginfig{h}{Representation of |(+ (* X Y) X 3)| as a list structure.}
\figlab{fiii}
{\offinterlineskip\parskip=0pt
\leavevmode
\→{15pt}\xcell{\car+}\→{20pt}\xcell{\markcar0}%
\→{20pt}\xcell{\markcar1}\→{20pt}\xcell{\car3}\raise2pt\hbox{\special{point 10}%
\hskip45pt\special{point 5}}
\hskip85pt\raise2pt\hbox{\special{point 2}}%
\→{15pt}\xcell{\car*}\→{20pt}\xcell{\markcar3}%
\→{20pt}\xcell{\vtop{\hbox{\hskip30pt\special{point 4}\hskip-30pt}%
\hbox to 40pt{\hfil\↓Y\hskip 0pt plus 3fil}}}
\hskip145pt\special{point 6}\hskip25pt\special{point 7}\hskip80pt\special{point 8}%
\hskip30pt\special{point 9}
\hskip145pt\hbox to 25pt{\hfil\↓X\hfil}\hskip80pt\hbox to 30pt{\hfil\↓{NIL}\hfil}
\special{join 12000 0 2}\special{join 12000 1 6 7 3}\special{join 12000 10 5 9 8 4}
}
%$$ ?? |(+ (* X Y) X 3)| ??$$
%
% ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπααα⊃
% ααααα→~ ~ εαααα→~ ~ εαααα→~ ~ εαααα→~ ~ εααααααα⊃
% %απα∀ααα$ %απα∀ααα$ %απα∀ααα$ %απα∀ααα$ ~
% ↓ ~ ~ ↓ ~
% + ~ ~ 3 ~
% ~ ⊂αααπααα⊃ ~ ⊂αααπααα⊃ ⊂αααπααα⊃ ~
% %α→~ ~ εααβα→~ ~ εαααα→~ ~ ~ ~
% %απα∀ααα$ ~ %απα∀ααα$ %απα∀απα$ ~
% ↓ ~ ~ ↓ ~ ~
% * %απαα$ Y %ααπα$
% ↓ ↓
% X NIL
\endfig
Part of the efficiency of \lisp\ comes from the fact that
programs communicate S-expressions to each other using the address
of the lead cell of the list structure rather than by moving
strings.
Thus the \qa-part of this
cell is the first element of the list and the \qd-part is a pointer to a
sublist formed by deleting the first element. The first word of
the list structure of figure
\figref{fiii} contains a pointer to the list
structure representing the atom |+|, while its \qd-part points to the
list |((* X Y) X 3)|. The second word contains the list structure
representing |(* X Y)| in its \qa-part and the list structure
representing |(X 3)| in its \qd-part. The last word points to the atom
|3| in its \qa-part and has a pointer to the atom \qNIL\ in its \qd-part.
This is consistent with the convention that \qNIL\ represents the null list.
Atoms are represented by the addresses of their property
lists which are list structures of a special kind depending on the
implementation.
(In some implementations, the first word of a
property list is in a special area of memory, in others the first word
is distinguished by sign, in still others it has a special \qa-part.
For basic \lisp\ programming, it is enough to know that atoms are
distinguishable from other list structures by a predicate called
\qat.)
Each atom is represented uniquely, so atoms can be tested for equality
just by testing the pointers. In the diagram, we don't give the structure
of property lists but just label them with their {\it print names}.
A partial dump of the memory of a computer containing the list
structure of figure \figref{fiii}
might look as shown in figure
\figref{fiv}.
Here X denotes the address of the property list of the atom
named |X| and |/3| denotes that of the atom named |3|.
Notice that the addresses of consecutive list elements need
not be consecutive.
Also the last word of a list cannot have the address of a next
word in its \qd-part since there isn't any next word, so it has the
address of a special atom called \qNIL\ which also represents the
null list.
\beginfig{h}{A memory map for the list structure |(+ (* X Y) X 3)|.}
\figlab{fiv}
\figsize
$$\vbox{%
\halign{\qquad \hfil # &\quad\hfil # &\quad \hfil # &
\qquad \hfil # &\quad\hfil # &\quad \hfil # \cr
%
address & \qa-part & \qd-part & address & \qa-part & \qd-part \cr
\noalign{\vskip-\smallskipamount}
\hrulefill&\hrulefill&\hrulefill& \hrulefill&\hrulefill&\hrulefill \cr
\noalign{\medskip}
86 & * & 87 & 1000 & + & 1001 \cr
87 & X & 88 & 1001 & 86 & 1002 \cr
88 & Y & NIL & 1002 & X & 2653 \cr
& & & 2653 & |/3| & NIL \cr
}}$$
\endfig
Notice that the same list may be represented by two
different list structures in memory.
%For example, the list structures
%$x$ and $y$ of figure \figref{zzz} both represent the list
%|(A (B C) (B C)|, and the two list structures share parts with each other
%and internally.
In constructing list structures that
represent lists, it would take too
much computer time to check the possibility that a list structure
representing the list might already exist in memory.
However, the sharing of structure, when it occurs, saves both time
and space. An important consequence, to be discussed later, is that
equality of lists and sameness of list structures don't
always coincide.
% exercises
\exercise If we represent sums and products as indicated above and
use |(- X)|, |(/ X Y)|, and |(EXPT X Y)| as representations
of $-x$, $x\div y$, and $x↑y$ respectively, then (a) what do the following lists represent?
%
$$|(/ 2 (EXPT (+ X (- Y)) 3))|$$
$$|(+ -2 (- 2) (* X (EXPT Y 3.3)))|$$
%
and (b) How are the expressions $xyz+3(u+v)↑5-3$
and $(xy-yx)\div (xy+yx)$ to be represented?
\exercise In the above mentioned graph notation, what graph is
represented by the following list?
$$|((A D E F) (B D E F) (C D E F) (D A B C) (E A B C) (F A B C))|$$
\exercise Write the list |(+ (* X Y) X 3)| as an S-expression.
This is sometimes referred to as ``dot-notation''.
\exercise Write the following S-expressions in list notation to
whatever extent is possible:
%
$$\vbox{\smallskip\halign{\hfil #&\quad # \hfil\cr
a.& |(A . NIL)|\cr
b.& |(A . B)|\cr
c.& |((A . NIL) . B)|\cr
d.& |((A . B) . ((C . D) . NIL))|.\cr
}}$$
\exercise How would you represent polynomials in one variable as lists?
Can you think of more than one useful way? How would you represent polynomials
in several variables? Can you tell if two lists represent the same
polynomial?
\section{Equality and other \lisp\ predicates.}
\sectlab{equality}
There is an important difference between \lisp\ predicates and those
of mathematical logic. Logical predicates always have
the value $true$ or $false$.
However, in \lisp\ or any system in which predicates or the terms to
which they are applied may be computed
by arbitrary programs, there is a third possibility --- the computation
may not terminate.
Therefore, \lisp\ predicates are treated as taking S-expressions
as values and not merely $true$ or $false$. Moreover, \lisp\ interprets
the value $|NIL|$ as $false$ and any other value that may be returned
as $true$. This is a convenience in programming, although it somewhat
complicates the logical treatment of \lisp. The definition of $flip$
in section \sectref{pure} is correct, because the conditional expression
$\qif p \qthen a \qelse b$ takes accepts and arbitrary S-expression
for $p$. However, the axioms and the statement of of correctness
require a reformulation that will be given in chapter \chapref{provin}. The
statement that $flip flip x = x$ is still valid, and the idea of
the proof is correct.
In chapter \chapref{provin} we discuss techniques for proving
properties of \lisp\ programs using mathematical logic. We allow
for the possibility of non-termination by supposing that a
\lisp\ expression has special value $\bot$ (read {\it bottom})
that {\it is not an S-expression}
in case the computation doesn't terminate.
Equality presents an additional problem, because the same
S-expression can be represented in memory by two different list
structures. Therefore, \lisp\ has two equality predicates.
$x \qeq y$ has the value $|T|$ if $x$ and $y$ are the
same list structure. If $x \qeq y = |T|$, then $x$ and $y$ certainly
represent the same S-expression, but the converse is false.
The second equality predicate $x \qequal y$ has value $|T|$
if $x$ and $y$ are the same S-expression. \qequal\ can be defined
by a program in terms of \qeq. However, \qequal\ isn't the same
as the logical predicate of equality denoted by =. Namely, if
the computation of $exp↓1$ or of $exp↓2$ doesn't terminate,
then $exp↓1 \qequal exp↓2$ has the value $\bot$. However,
the logical expression $x = y$ can only be $true$ or $false$.
Moreover, if neither $x$ nor $y$ terminate, then $x = y$ reduces
to $\bot = \bot$, which is true. Thus we have
$$[x\qequal\bot] = [\bot\qequal x] = \bot.$$
\qat, \qn\ and \qequal\ all give \qT\ or \qNIL\ when their
arguments are defined, but they can occur applied to terms whose
computation may not terminate.
In most of our theoretical and practical work we can regard
ourselves as computing with S-expressions and not with list structures.
In this case the significant predicate is \qequal\ rather than \qeq.
For now you can get by by remembering that \qequal\ is not
quite the same as =, and that the difference will be explained again
later.
\lisp\ uses logical connectives \qand, \qor\ and \qnot\ (read
{\it and}, {\it or} and {\it not}) which are defined as follows
in terms of conditional expressions.
%
$$\vbox{\ialign{\hfil $#$&$\>=#$\hfil\cr
p \qand q & \qif p \qthen q \qelse \qNIL\cr
p \qor q & \qif p \qthen p \qelse q\cr
\qnot p & \qif p \qthen \qNIL \qelse \qT\cr
}}.$$
%
Notice that their values can be other than $|T|$ and $|NIL|$
as well as being potentially undefined.
\section{Defining functions and predicates on lists.}
\sectlab{functions}
The function $alt$ defined by
%
$$alt\, u ← \qif \qn u \qor \qn \qd u \qthen u \qelse \qa u \qcons alt \qdd u$$
%
has as value a list $alt u$ containing every other element of $u$
starting with the first element. Thus
%
$$alt[|(A B C D E)|] = |(A C E)|.$$
%
We illustrate its computation by
%
$$\eqalign{
alt |(A B C D E)| &= |A|\qcons alt[|(C D E)|]\cr
&= |A|\qcons[|C|\qcons alt[|(E)|]]\cr
&= |A|\qcons[|C|\qcons |(E)|]\cr
&= |(A C E)|.\cr
}\hfil$$
Since the termination case of $alt$ is when its argument is
\qNIL\ or its \qd-part is \qNIL, $alt$ doesn't work properly if
$u$ is an S-expression that isn't a list.
The termination test won't ever be satisfied, and $alt$ will try to
take the \qd-part of an atom. In most \lisp s this will result in an
error message. It is common in \lisp\ for a function to be defined
in such a way that it works only for arguments of a certain kind.
A much used function is $append$. We prefer to write
$append[u,v]$
with an infix $*$, i.e. as $u * v$ instead of $append[u,v]$. It
satisfies the definition
%
$$u * v ← \qif \qn u \qthen v \qelse \qa u \qcons [\qd u * v],$$
%
and appends the list $u$ onto the front of the list $v$. Thus
%
$$\eqalign{
%
\qNIL * |(A B)| &= |(A B)|,\cr
%
|(A B)| * |(C D)| &= |(A B C D)|,\cr
%
\noalign{\noindent{\rm and}}
%
|(A)| * |(B C)| &= |(A B C)| = |A|\qcons |(B C)|.\cr
}$$
$append$ has a number of interesting algebraic properties which we will
prove in chapter \chapref{provin}. For example,
%
$$u * [v * w] = [u*v]*w.$$
%
This property is called associativity and enables us to write simply $u*v*w$
without any bracketing. Also
%
$$\qNIL*u = u*\qNIL = u;$$
%
i.e. \qNIL\ is a unit element for the $append$ operation on lists.
The last of the above examples illustrates the fact
that appending a list of one element
onto the front of another list is equivalent to $cons$ing the
element onto the list.
\section{\lisp\ programs as \lisp\ data}
\sectlab{internal}
In this chapter so far we have used \lisp\ {\it external
notation} which takes advantage of typographical devices to
express \lisp\ programs concisely and clearly.
The notation is also compatible with the mathematical logical
notation used in chapter \chapref{provin}. Notations similar to external
notation but in a single type font have been used as input
to some \lisp\ systems but have not proved popular.
Instead most \lisp\ systems including \clisp\ use
{\it internal notation} which represents \lisp\ programs
as \lisp\ lists. Internal notation, while somewhat wordy,
is quite writable and readable and is
especially appropriate when programs are to be written to
manipulate \lisp\ programs. Such programs include interpreters,
compilers, macro expanders, and parts of AI programs that
produce programs associated with particular objects.
There are also high level languages that compile into \lisp\ internal
notation.
The jargon of \lisp\ hackers is based on the internal
notation. One speaks of $cdr$ing down a list and of $cons$ing
up a structure of some kind.
Internal notation represents \lisp\ programs as lists
according to the following conventions. We write $exp↑*$ for
the internal form of the external expression $exp$.
1. Terms are written as lists with the
function or operator name as the first element and its arguments as the
remaining elements of the list. No infixes are used.
2. The basic functions have conventional names different
from those used in external notation.
$$\eqalign{
\qa\qquad&|CAR|\cr
\qd\qquad&|CDR|\cr
\qad\qquad&|CADR|, {\rm etc.}\cr
\qcons\qquad&|CONS|\cr
\qeq\qquad&|EQ|\cr
\qequal\qquad&|EQUAL|\cr
\qat\qquad&|ATOM|\cr
\qn\qquad&|NULL|\cr
}$$
3. Internal notation is insensitive to case. We use upper
case in this book.
4. S-expression constants must be quoted with certain exceptions.
The S-expression $e$ is written $|(QUOTE e)|$.
Thus $|(PLUS A B)|$ is internally represented as $|(QUOTE (PLUS A B))|$.
In input to \lisp\ one can write $'|(PLUS A B)|$ as an abbreviation
of $|(QUOTE (PLUS A B))|$,
but \lisp\ uses the form with $|QUOTE|$ internally.
The exceptions include numbers, $|T|$ and $|NIL|$.
5. The conditional $\qif p \qthen a \qelse b$ is written
$|(IF| p↑* a↑* b↑*|)|$.
There is a second much used internal
form for conditional expressions.
%
$$|(COND (|p↓1↑* e↓1↑*|) (|p↓2↑* e↓2↑*|)| \cdots |(|p↓n↑* e↑*↓n|))|$$
%
represents
%
$$\qif p↓1 \qthen e↓1 \qelseif p↓2 \qthen e↓2 \cdots \qelseif p↓n \qthen e↓n.$$
%
Unless $p↓n$ is $|T|$ there is a possibility of ``falling off the end'' of
the conditional expression.
6. A \lisp\ definition of the form
%
$$foo[arg↓1,\cdots,arg↓n] ← expression$$
%
is written
%
$$|(DEFUN FOO (|arg↓1↑* \cdots arg↓n↑*|)| expression↑*|)|.$$
Putting these conventions together, the definition
%
$$alt\,u ← \qif \qn u \qor \qn\,\qd u \qthen u \qelse \qa u\qcons alt \qdd u$$
%
has the internal form
\begindefun
(DEFUN ALT (U)
(IF (OR (NULL U) (NULL (CDR U)))
U
(CONS (CAR U) (ALT (CDDR U))))).
\enddefun
Here are more examples.
\noindent 1. We have already seen $append$.
%
$$u*v ← \qif \qn u \qthen v \qelse \qa u \qcons [\qd u * v]$$
%
\begindefun
(DEFUN APPEND (U V)
(IF (NULL U)
V
(CONS (CAR U) (APPEND (CDR U) V))))
\enddefun
\noindent 2. \qequal\ can be defined using \qeq. Notice that this
definition makes essential use of the left-to-right order of evaluation
of the arguments of \qand\ and \qor\ which follows from their conditional
expression definitions. But for that the recursion wouldn't work.
$$x\qequal y ← x\qeq y \qor
[\qnot\qat x\qand\qnot\qat y\qand\qa x\qequal\qa y\qand\qd x\qequal\qd y]$$
\begindefun
(DEFUN EQUAL (X Y)
(OR (EQ X Y)
(AND (NOT (ATOM X))
(NOT (ATOM Y))
(EQUAL (CAR X) (CAR Y))
(EQUAL (CDR X) (CDR Y)))))
\enddefun
\section{\lisp\ terms}
\sectlab{terms}
Let us summarize the kinds of \lisp\ terms we have seen
in both external and internal notation. In external notation we
shall give both the prefix and infix notation emphasized in this
book and the ordinary function notation used in some other literature.
Terms are expressions built up from variables, constants,
function symbols and conditionals. They have values that depend on
the values of the variables that occur in them. Some examples
of terms in external and internal form are given in figure \figref{ti}.
\beginfig{h}{%
Examples of \lisp\ terms in external and internal notation and their values.}
\figlab{ti}
\figsize
\medskip
\halign{#\hfil&\quad$#$\hfil&\quad # \hfil&\quad #\hfil\cr
& {\rm External\ form} &{\rm Internal form}&{\rm Value denoted}\cr
\noalign{\vskip-\smallskipamount}
& \hrulefill & \hrulefill & \hrulefill \cr
\noalign{\medskip}
(i) &|(A . B)|&|(QUOTE (A . B))|&|(A . B)|\cr
&x&|X|&|(A . B)|\cr
&|(CAR X)| &|(QUOTE (CAR X))|&|(CAR X)|\cr
\noalign{\medskip}
(iia) & \qa |(A . B)| &|(CAR (QUOTE (A . B)))|&|A|\cr
(iib) & \qa x & |(CAR X)|&|A|\cr
(iic) & \qd |(A . B)| &|(CDR (QUOTE (A . B)))|&|B|\cr
(iid) & |A| \qcons |B| &|(CONS (QUOTE A) (QUOTE B))|&|(A . B)|\cr
(iie) & |A| \qcons y & |(CONS (QUOTE A) Y)| &|(A . C)|\cr
\noalign{\medskip}
(iiia) & \qa |((A . B) . A)| &|(CAR (QUOTE ((A . B) . A)))|&|(A . B)|\cr
(iiib) & \qd |((A . B) . A)| &|(CDR (QUOTE ((A . B) . A)))|&|A|\cr
(iiic) & |(A . B)| \qcons |A| &|(CONS (QUOTE (A . B)) (QUOTE A))|&|((A . B) . A)|\cr
\noalign{\medskip}
(iva) & \qa |(A B C D)| &|(CAR (QUOTE (A B C D)))|&|A|\cr
(ivb) & \qd |(A B C D)| &|(CDR (QUOTE (A B C D)))|&|(B C D)|\cr
(ivc) & |A| \qcons |(B C D)| &|(CONS (QUOTE A) (QUOTE (B C D)))|&|(A B C D)|\cr
(ivd) & y\qcons v& |(CONS Y V)| &|(C D E)|\cr
\noalign{\medskip}
(v) & \qat |A| &|(ATOM (QUOTE A))|&|T|\cr
& \qat |(A . B)| &|(ATOM (QUOTE (A . B)))|&|NIL|\cr
\noalign{\medskip}
(vii) & |A| \qeq |A| &|(EQ (QUOTE A) (QUOTE A))|&|T|\cr
& |A| \qeq |B| &|(EQ (QUOTE A) (QUOTE B))|&|NIL|\cr
& x \qequal |(A . B)|& |(EQUAL X (QUOTE (A . B)))|& |T|\cr
& |(A . B)| \qeq |(A . B)| &|(EQ (QUOTE (A . B)) (QUOTE (A . B)))|& ?\cr
\noalign{\medskip}
(viii) & \qif \qat x \qthen\qNIL\qelse y&|(IF (ATOM X) NIL Y))|&|C|\cr
}
\medskip
\hbox{where
$x = |(A . B)|$,
$y = |C|$,
$u = |(A B C)|$, and
$v = |(D E)|$.}
\vbox{\strut}
\hrule
\endfig